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Topics in Markov chains: mixing and escape rate 

Julia Komjathy and Yuval Peres 


Abstract. These are the notes for the minicourse on Markov chains delivered 
at the Saint Petersburg Summer School, June 2012. The main emphasis is on 
methods for estimating mixing times (for finite chains) and escape rates (for 
infinite chains). Lamplighter groups are key examples in both topics and the 
Varopolous-Carne long range estimate is useful in both settings. 


1. Preliminaries 

We start with preliminary notions necessary for the analysis of mixing and 
relaxation time of Markov chains. For much more on this topic see the books 

[AF02, LPW08]. 


1.1. Total variation distance and coupling. We start with the definition 
of total variation distance and coupling of two probability measures: 

Definition 1.1. Let 5 be a state space, and p, and v be two probability 
measures defined on S. Then the total variation distance between /i and v is 
defined as 


ll/i- i^IItv = max|^(T) - i^{A)\. 

Definition 1.2. A coupling of two probability measures n and on 5 is a 
pair of random variables {X,Y) having joint distribution q on S x S such that 
the marginal distributions are P[Y = x] = J2yeS ~ = 2/] = 

ExgS = ^{y) for every x,y G S. 


1991 Mathematics Subject Classification. Primary 60J10, 160D05, 37A25. 

Key words and phrases. Random walk, generalized lamplighter walk, wreath product, mixing 
time, relaxation time, Varopolous-Carne long range estimates. 

J. Komjathy was supported by the grant KTIA-OTKA CNK 77778, funded by the Hun¬ 
garian National Development Agency (NFU) from a source provided by KTIA. 


1 



2 


JULIA KOMJAtHY and YUVAL PERES 


Then, the followings give equivalent characterizations of the total variation 
distance ||/r — u||tv' 


( 1 . 1 ) 

( 1 . 2 ) 

(1.3) 

(1.4) 


max \a(A) — iy(A)\ 

Acs 

xGS 

(Ax) - v{x)) 

x^S-.^{x)>v{x) 

inf {P[X ^ Y\ : (X, Y) is a coupling of fj, and u} 


Proof. It is intuitively clear that the set B := {x : /r(x) > u(x)} or its 
complement maximizes the right hand side in Definition 1.1. To give a formal 
proof, take A C 5. From the definition of B it follows that 


(1.5) m(^) ~ ^ n i?) — ^{A r\ B) < n{B) — v{b). 

This proves that (1.1) < (1-3). But, if we take A = B, then the maximum is taken, 
i.e. (1.1) = (1-3). By the same reasoning, with B'^ ■.= S \ B we also have 

(1.6) v{A) - ^i{A) < v{A n B^) - n{A n B^) < v{B‘^) - /r(B"). 

Note that since = 1 ~ v{B^) = 1 — v{B) the right hand side of (1.5) 

and (1.6) coincide, thus yielding 

max \n{A) - v{A\ = \ (m(^) - CA + p(S^) - ABA = \Y1 1^(2;) - Ax)\, 

xGS 

proving (1.1)=(1.2). 

To see that (1.1) < (1-4), we write 

AA) - vA) = P[x e 24] - P[F e A] 

<p[a: eA,YiA] 

< P[X ^ Y]. 


For the other direction we construct a coupling for which the infimum is attained. 
Intuitively, what we do is pack as much mass into the diagonal q{x, x) as we can, 
such that we still maintain the correct marginal measures. More formally, let us 
define 


q{x,x) 

qix,y) 

q{x,y) 


:= min{^(x), u(x)} 

:= 0 if q{x,x) = Ax) or q{y,y) = v{y) 

{Ax) - Ax)){Ay) - Ay)) 


1 


if q{x^ x) 


= v{x) and q{y,y) = Ay)- 


Intuitively, we put the maximal possible weight in the diagonal of q, (which is 
min{/r(x), u(x)} and then we put zeros in the corresponding column or row, depend¬ 
ing on the minimum being /i(x) or v(x). Finally, we fill the rest out with condition¬ 
ally independent choice, i.e. on BxB'^ we distribute (/i(x) — u(x)) • (z/(y) — ^(y)) > 0 
with the normalizing factor I — q(z, z). Mind that this is not the only way of 
doing the coupling. To check that the marginals are correct is left to the reader. 
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With this particular coupling, (1.4) becomes 

(1.4) < P(X ^Y) = 1- Y^ q{x, a;) = 1 - ^ min{/i(a;), v{x)} 

X X 

= XI v{x)+ ^ ^i{x) 

X \x:iJ.{x)'>iy{x) x:fi{x)<i/{x) 

= X [^(a:) - = (1-3)- 

x:fi{x)'>iy{x) 

With this we have (1.4)< (1.3)=(1.1), finishing the proof. □ 

1.2. Mixing in total variation distance. Let W be a Markov chain on state 
space S with transition matrix P, and stationary measure tt on S. That is, ttP = tt. 
If P is irreducible and aperiodic, then the measure ^t(y) = P*ix,y) is converging 
to the stationary measure exponentially fast, i.e. there exists an a G (0,1) such 
that 

||P‘(:r,.)-7r(.)||Tv<Ca‘. 

These asymptotics hold for a single chain as the time t tends to infinity. However, 
we are rather interested in the finite time behavior of a sequence of Markov chains, 
i.e. how long one has to run the Markov chain as a function of |5|, to get e-close 
to stationary measure, for fixed e. 

Thus, let us define 

(1.7) d,j(t) := ||P*(a::, •) - 7r(-)||Tv; d{t) := maxdxit). 

x£S 

Then, the e-mixing time of a Markov Chain on a graph G is defined as 

(1.8) tmix(G', e) := min {t > 0 : d{t) < e} . 

Throughout, we set tmix(G) := tmix(G', \). The characterisation (1.4) suggests that 
sometimes it is more convenient to work with chains started from two different 
initial states, so let us define 

d{t) := max \\P\x, ■) - P\y, OUtv- 

x,yG6 

Then, we have the following comparison: 

Lemma 1.3. With the above definitions, 

(1.9) d{t) < d{t) < 2d(t) 

Further, the function d{t) is submultiplicative, i.e. 

(1.10) d{t -I- s) < d{t)d{s), 
and combining yields 

(1.11) d{kt) < 2'^d{tf 

Proof. We only prove (1.9) here. The proof of (1.10) is the proof of Lemma 
4.12 in [LPW08], and (1-11) is an easy combination of the first two statements of 
the lemma. To prove the second inequality in (1.9), we use the triangle inequality 

d{t) = \\P\x, .) - P‘(y, OIItv < \\P\x ,.) - t(.)||tv + ||7r(-) - P\y, OIItv 

< 2d(f), 
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and for the first inequality we can use that ttP* = tt to get 

dx(t) = max \P*{x, A) — 7r(A)| = max I ^ TT{y)P*{x, A) — P*{y, A) . 

Now, by the triangle inequality the right hand side is at most 

\P\x,A) - P\y,A)\ < ^7r(?/)max|P‘(x, A) - P*{y,A)\ 

yeS yeS 

= d{t). 

□ 

The definition d(t) is extremely useful, since it allows us to relate the mixing 
time of the chain to the tail behavior of the so called coupling time: Given a 
coupling {Xt, Yt) of P*{x, •) and P*{y, ■), let us define 

Ecoupie := min{f ■. Xt = Yt}. 

Then we have 

(1.12) d{t) < d{t) < maxP[Xt ^ Yt] = maxP[Tcoupie > t], 

x,y x,y 

With all these prerequisites in our hands, we can state and prove our first 
theorem: 

Theorem 1.4. The mixing time ofCn, the cycle on n vertices is bounded from 
above by 

linix{Cn^ ^ R ■ 

Proof. We will construct a coupling of the measures P*{x,-) and P*{y,-) 
and use (1.12) to estimate d{t). Note that P^{x,-) and P^{y,-) are the transition 
measures of two lazy random walks, say Xt and Yt, with Xq = x and Yq = y. 
Thus, we construct a coupling of (WWt) as follows: we couple the increments of 
the walks, as long as Xt ^ Yt holds: 

P(W-^t-i=0,Yt-Yt_i = +l) = l; P(W-^t-i=0,Yt-Yt_i = -l) = l; 
P(W-^t-i = +l,Yt-Yt_i = 0) = i; P(W-^t-i=-l,l"t-Yt_i=0) = i. 

If the two walks meet than they stay together from that point on. It is easy 
to check that the marginals of the two walks are correct. The advantage of this 
coupling is that before collision the two walks never move at the same time. I.e., 
the clockwise distance Dt = Xt—Yt changes at each step by +1 or —1. This means 
that Dt is doing a simple (non-lazy) symmetric random walk on {0,1,.. .n} with 
Dq := k G {1,... n — 1}, and we are waiting until it hits 0 or n. This is exactly the 
well known Gambler’s ruin problem. The coupling time is then ro,n) the hitting 
time of the set {0,n}. We can use the martingale Dt and use optional stopping to 
calculate its expected value: 

k = Efc[£)o] = Efc[T>ro,„] = ^k[Dro,„ = n]n, 

from which VklDro „ = ^] = k/n. Then, D^ — t is also a martingale, (to check is 
left for the reader as an exercise) and using the previous calculation and optional 
stopping gives 

Efc[To,n] = k(n -k) < n^/4. 
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Combining Lemma 1.3, the characterisation (1.4) of the total variation distance 
and the previous calculations with a Markov’s inequality, we arrive at the following 
sequence of inequalities: 

d{t) < d{t) < maxPlXf ^Yt]= maxP^fHt > t] < 

x,y k t At 

Now let us set t = n^, then we get d{n?) < 1/4, implying < n^. □ 

A similar coupling can be used to give an upper bound on the mixing time on 
the d-dimensional tori: 

Theorem 1.5. The total variation mixing time on the d-dimensional torus 
is bounded from above by 

(1.13) W(Z;() < 3dlogd 

Proof. We couple the two walks = (X/, X/,... X/) and ... Yf^) 

coordinate-wise with the same coupling as in the proof of Theorem 1.4. More pre¬ 
cisely, at each step we first pick a uniform number Ut € {1, 2,... d} independently 
of everything else, and then, we check if the corresponding coordinates , Y^* 
coincide or not. If so, we move both walks with the same increment: 0, -1-1 or —1 
with probabilities 1/2,1/4,1/4 each. If Y)^‘, then we apply the coupling 

described in the proof of Theorem 1.4 for the 17tth coordinate. Let D\ denote the 
clockwise difference between XI and Yf, and denote the first time when Dl hits 
{0, n}. Since each coordinate i has a Geometric) 1/d) waiting time for its next move, 
the marginal distribution of each can be written as 

rd) 

1=1 

with Zj ~ Geo(l/d), and ~ ro,n as in the proof of Theorem 1.4. This gives 
that E[Ti] < Note that this bound holds for every starting point Xi,yi. So we 
can run the chain in blocks of dn^/2 and then in each block we hit the set {0,n} 
with probability at least 1/2 by Markov’s inequality. Hence the hitting of the set 
{0, n} is stochastically dominated by a random variable of the form 4dn^Geo(l/2). 
This yields the bound 

2t 

P[r. > t] < 2 , 

where the factor 2 comes from ignoring the integer part of Set t = 3dlogd-n^, 
then, for all d > 2: 

P[^t 7^ Yy] = P[3i : T, > t] < d • P[r, > t] < 2d Q j" = 2d^-^^°s2 < ^ 

Hence we have ti„ix(Z(() < 3dlogdn^, finishing the proof. □ 

1.3. Strong stationary times. In many cases the following random times 
give a useful bound on mixing times: 

Definition 1.6. A randomized stopping time r is called a strong stationary 
time for the Markov chain Xt on G if 

(1.14) ¥^[X^ = y,T = t] = T:{y)F^[T = t], 
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that is, the position of the walk when it stops at r is distributed according to tt 
and independent of the value of r. 


The adjective randomized means that the stopping time can depend on some extra 
randomness, not just purely the trajectories of the Markov chain, for a precise 
definition see [LPW08, Section 6.2.2]. 

Definition 1.7. A state h{x) G V(G) is called a halting state for a stopping 
time r and initial state x if {Xt = h{x)} implies {r < t}. 

Strong stationary times are useful since they are closely related to an other 
notion of distance from the stationary measure. We define 


Definition 1.8. The separation distance s{t) is defined as 

/ yi \ 

(1.15) s{t) := maxSxU) with SxU) := max ( 1-— ) . 

xes ^ ' yes \ TT{y) j 

We mention that the separation distance is not a metric. 

The relation between the separation distance and any strong stationary time r 
is the following inequality from [AF02] or [LPW08, Lemma 6.11]: 

(1.16) \lx G S ■. Sx{t) < Pa;(T > t). 


The proof is just two lines, so we include it here for the reader’s convenience: for 
any y we have 

n 171 1 - < 1 _ = y,r<t] 

^ ^ Ay) - Ay) 

Now (1.14) implies that the last expression equals 


7r(y)Px['r < t] 
T^iy) 


Px[t > t]. 


Later we will need a slightly stronger result than (1.16), namely from (1.17) 
it follows that if r has a halting state h(x) for x, then putting y = h(x) yields 
that equality holds in (1.16). Unfortunately, the statement can not be reversed: 
the state h{x, t) maximizing the separation distance at time t can also depend on 
t and thus the existence of a halting state is not necessarily needed to get equality 
in (1.16). 

On the other hand, one can always construct r such that (1-16) holds with 
equality for every x G S. This r does not necessarily obeys halting states. This is 
one of the main ingredients to our proofs in Section 2, so we cite it as a Theorem 
(with adjusted notation). 


Theorem 1.9. [Aldous, Diaconis][\D86, Proposition 3.2] Let {Xt,t > 0) be 
an irreducible aperiodic Markov chain on a finite state space S with initial state x 
and stationary distribution tt, and let Sx{t) be the separation distance defined as in 
(1.15). Then 

(1) if T is a strong stationary time for Xt, then Sx{t) < Pa;(r > t) for all 
t > 0. 

(2) Conversely, there exists a strong stationary time t such that Sx{t) = 
Pa;(r > t) holds with equality. 
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Combining these, we will call a strong stationary time r separation optimal if 
it achieves equality in (1.16). Mind that every stopping time possessing halting 
states is separation optimal, but not the other way round. 

The next lemma relates the total and the separation distance: 


Lemma 1.10. For any reversible Markov chain and any state x G S, the sepa¬ 
ration distance from initial vertex x satisfies: 

(1.18) ^2,(1) < Sj:{t) 

(1.19) s^{2t) < M(t) 

Proof. For a short proof of (1.18) see [AF02] or [LPW08, Lemma 6.13], and 
combine [LPW08, Lemma 19.3] with a triangle inequality to conclude (1.19). Here 
we write the proofs for the reader’s convenience. We have 

P\x,yy 


dx{t)= ^ [K{y)-P\x,y)]= ^ 7r(y) 


yeS 

P*ix,y)<Tr(y) 


1 - 


yeS 

P*ix,y)<'^iy) 

P\x,y) 


yy) . 


< max 
y 


1 - 


TT 


{y) 


- ^xif)- 


To see (1.19), we mind that reversibility means that P*{z,y)/TT{y) = P^{y, z)/tt{z). 
Hence we have 

P^\x,y) ^P\x,z)Pyz,y) ^ P\x, z)P^y, z) ^ 

= Z.--= 2.- yiy - 


7r(y) 


7r(z) 


z^S 

Applying Cauchy-Schwarz to the right hand side implies 

^ ^ (^VPKx,z)P*{y,z)^ > (^P\x,z) AP*{y,z)^ . 

H{z) A iy{z) = l-\\fj.- :/||tv- 

<l-(l-llP‘(x,.),P‘(y,.)llTv)'. 


Recall (1.4), i.e. 


Combining this with the previous calculation results in 
P‘^\x,y) 


1 - 


7r(y) 


Using the triangle inequality j|P*(a;,.) — P‘(?/, .)jjTV < 2d{t) and expanding the 


terms yields (1.19). 


□ 


We demonstrate the use of strong stationary times by analysing the separation 
time of the d-dimensional hypercube: the separation time is defined similarly as 
the mixing time in (1.8) by replacing d{t) by s{t). 

Theorem 1.11. For the lazy random walk on the hypercube Fid = {0,1}'’*, 

^sep {Hd,£) < dlogd + log(l/e)d. 

Proof. We construct the following strong stationary time for the lazy random 
walk on the hypercube: independently in each step, we pick a uniform coordinate 
Ut € {0,1,... d}, and then independently of the current values and everything else, 
we set Xy* = 1 with probability 1/2 and X^* = 0 with probability 1/2. By doing 
so, the probability that the chain stays put is exactly 1/2, and with probability 1/2 
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it moves to a position chosen uniformly among all neighboring vertices, i.e., we get 
exactly the transition probabilities for a lazy random walk on the hypercube. 

Define Trefresh as the first time that all coordinates have been chosen. Then, at 
^refresh, each Coordinate i G {1,... d} has been selected already at least once, thus, 
its position is 0 or 1 with probability 1 /2 each, independently of how long we had 
to wait for Trefresh to happen. Also, if the original state was x = (xi,X 2 , ■ ■ -Xd), 
then to reach h{^ = (1 — xi, 1 — X 2 , ... 1 — cc^), we have to refresh each coordinate 
at least once, i.e., /i(a:) is a halting state for Trefresh- This shows that Trefresh is a 
separation-optimal strong stationary time for the lazy RW on the hypercube. 

Note that the distribution of Trefresh is the same as that of the coupon collector 
problem: 

Sx{t) = Pa;[Trefresh > t] = P[3f G {1, . . . , d} : VS < t Us l] < d (^1 - ^ 

By putting t = dlogd — log(£)d, the right hand side of the previous display is less 
than = e, finishing the proof. □ 

Remark 1.12. It is known (see [LPW08, Example 12.17] that the total varia¬ 
tion mixing time of the hypercube is at Idlogd, hence we have a factor 2 between 
the separation and tv-mixing time on Hd- Comparing it to the estimate in (1.19), 
this shows that the factor 2 there can be sharp. 



The following lemma will be used later to determine the spectral gap of the 
lamplighter chain: ([LPW08, Corollary 12.6]) 


Lemma 1.13. For a reversible, irreducible and aperiodic Markov chain, 


( 1 . 20 ) 


dx{t) < Sx,{t) < —^ 
Tfmin 

IA 2 I* < 2d{t) 


with TTmin = mifryg^ 7r(j/) and A* = maxjjAj : A eigenvalue of P, X ^ 1}. As a 
consequence we have 

lim d{tV^* = A*. 

t—>-oo 


Proof. Follows from [LPW08, Equation (12.11), (12.13)]. We note that Lemma 
1.10 implies that the assertion of Lemma 1.13 stays valid if we replace d(t)^/* by 
the separation distance s(t)^'^‘. □ 


2. Mixing times of lamplighter graphs 

In this section, we will use the preliminaries from the previous sections to de¬ 
termine the mixing and relaxation time of the random walk on lamplighter graphs. 
The intuitive representation of the walk is the following: a lamplighter moves ac¬ 
cording to a simple random walk on the vertices of a base graph G. Further, there 
is an identical lamp attached to each vertex v G G, and each of the lamps is either 
on or off. We denote the state of the lamp at vertex u S G by /„. Then, as the 
lamplighter walks along the base graph, he switches on or off lamps on its path ran¬ 
domly. More precisely, we are analyzing the following dynamics below: one move 
of the lamplighter walk corresponds to three elementary steps: he randomizes the 
lamp on its current position, then he moves according to a lazy simple random walk 
on the base graph, then he randomizes the lamp at its arrival position. 
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Suppose that G is a finite connected graph with vertices V (G) and edges E{G). 
We refer to G as the base graph. Let X{G) = {/: t^(G) —>■ {0,1}} be the set of 
markings of V{G) by elements of (0,1}. The wreath product Z 2 1 G is the graph 
whose vertices are pairs (/,cc) where / = {fv)^^v(G) ^ ^ V{G). There 

is an edge between {f,x) and {g,y) if and only if {x,y) € E{G), and fz = Qz for 
all z ^ {x^y}. Suppose that P is the transition matrix for lazy random walk on 
G. The lamplighter walk X* is the Markov chain on Z 2 1 G which moves from a 
configuration (/, x) by 

(1) picking y adjacent to a; in G according to P, then 

(2) updating each of the values of fx and fy independently to a uniform 


random value in {0,1}. 


The state of lamps fz at all other vertices z G G remain fixed. It is easy to see that 
with stationary distribution ttg for the random walk on G, the unique stationary 
distribution of is the product measure 


((/.a^)) = '^cix) -2 


.0 


TT 


and X* is itself reversible. In this notes, we will be concerned with the special case 
that P is the transition matrix for the lazy random walk on G. In particular, P is 
given by 



( 2 . 1 ) 


for x,y GV (G) and where d{x) is the degree of x. This assumption guarantees that 
we avoid issues of periodicity. 



Figure 1. A typical state of the lamplighter walk on the 2-dim 
torus on 5 vertices. Lamps that are ‘on’ (resp. ‘off’) are drawn in 
yellow (resp. blue) and the position of the lamplighter is marked 
by the dashed circle. 

We will study below the total variation {TV) mixing time and the relaxation 
time of these walks. The relaxation time is a more algebraic point of view of mixing, 
that looks at the spectral behavior of the transition matrix P. Namely, since P is 
a stochastic matrix, 1 is the main eigenvalue and all the other eigenvalues of P lie 
in the complex unit disk. If further the chain is reversible, then the eigenvalues are 
real and it makes sense to define the relaxation time of the chain by 


1 


trel(G) : = 
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where A 2 is the second largest eigenvalue of the chain. 

In general it is known that for a reversible Markov chain the asymptotic be¬ 
havior of the relaxation time, the TV and a third notion, the uniform mixing time, 
which is mixing in ^co norm, can significantly differ, i.e. in terms of the size of the 
graph G they can have different asymptotics. More precisely, we have 

irei(G)<Cr,(G,l/4)<Cix(G,l/4), 

see [AF02] or [LPW08]. The lamplighter walk described above is an example 
where these three quantities have different order of magnitude in terms of |G|. 

Throughout, we use the superscript o to specify that a quantity belongs to the 
lamplighter walk, that is, the underlying graph is Z 2 I G. In order to state our 
general theorems, we first need to review some basic terminology from the theory 
of Markov chains. Let P be the transition kernel for a lazy random walk on a finite, 
connected graph G with stationary distribution tt. 

The maximal hitting time of P is 

(2.2) 4it(G) := max 'Exlry], 

x,y£V{G) 

where Ty denotes the first time t that X{t) = y and E^, stands for the expectation 
under the law in which A(0) = x. The random cover time Tcov is the first time 
when all vertices have been visited by the walker X, and the cover time tcov(G) is 

(2.3) tcov(G) != max Ea,[Tcov]- 

xev{G) 

Then we have the following two theorems (from [PR04]): 

Theorem 2.1. Let us assume that G is a regular, connected graph. Then there 
exist universal constants 0 < ci < Gi < 00 such that the relaxation time of the 
lamplighter walk onTi^lG satisfies 

(2.4) Cithit(G) < trel{'Z2 I G) < Githit(G), 

Theorem 2.2. Let G be a regular connected graph. Then there exist universal 
constants 0 < C 2 < G 2 < 00 such that the mixing time of the lamplighter walk on 
Z 2 I G satisfies 

(2.5) C2tcov(G) < tmix(Z2 I G) < G2tcov(G). 

2.1. Proofs. Here we modify the proof that can be found in [KP12] for more 
general lamp graphs to the setting where the lamp graph is Z 2 . We start by con¬ 
structing an ‘almost’ stationary time r* for the lamplighter walk. More specifically, 
the first refreshment of a lamp at site u is a strong stationary time on the copy at v 
of the two-state Markov chain on {0,1}, and we stop the chain when all lamps reach 
their individual stopping time, i.e. exactly when we cover all vertices. At Tcov, the 
lamps are already stationary, but the position of the walker not necessarily. 

It is easy to see that the state of the lamps are already stationary when t^ov 
has happened, that is, for any starting state (/p,a;o) 

(2-6) ^( 4 , 0 : 0 ) [^t = (/> x), Tcov = i] = • P(/^,a:o) [Xt = X, Tcov = t] . 

Further, if a lamp is in state x, then 1 — cc is a halting state for the two state 
Markov chain. From this it is not hard to see that the vectors ((I — fo{v))v gG, 2/) 
are halting state vectors for t^ov and initial state (/g,xo) for every XQ,y G G. 
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Lemma 2.3. For the separation distance on the lamplighter chain Z2 i G the 
following lower bound holds: 

- ^ilg,xo) Kov > t] . 


Proof. Observe that reaching the halting state vector ((1 — fo { v )) veG : x ) im¬ 
plies the event Tcov < t so we have 
(2.7) 

[^t = ((1 - /o(^')),;eG,a:)] _ P(/^.xo) [^t = ((1 “ foiv))veG,x),Tco^ < t] 


Now pick a vertex x = Xxo,t S G which minimizes P [Xt = Xxo,t\Tcov < t] l'XG{xxo,t)- 
This quotient is less than 1 since both the numerator and the denominator are 
probability distributions on G. Then, using this and (2.6), 1 minus the right hand 
side of (2.7) equals 

T ^ G { Xxo , t ) 

The separation distance is larger than the left hand side of (2.7) by definition, and 
the proof of the claim follows. □ 


> 1 -] 


(/g.aJo) 


[rO<t]. 


With this lemma in hand, we can already prove the lower bound in Theorem 

2 . 2 . 


Proof of the lower bound for mixing time of Z2 I G. Let us set t := 
6 tniix(Z 2 I G). Then Lemma 2.3 and Lemma 1.10 yields us the following sequence 
of inequalities: 


^{f .xo)['rcov> 6 tmix(Z 2 I G)]< S?f 


(R4. I’l 1 r<\\ . 


1 

2 ’ 


where in the last inequality we used the sub-multiplicativity property (1.11). Note 
that this estimate is independent of the starting state. Comparing the left and 
right hand sides, we conclude that we can run the chain in blocks of 6 tniix(Z 2 I G), 
and in each block the graph G is covered with probability at least 1/2. Thus, t^ov 
can be stochastically dominated by 6 tniix(Z 2 I G)Geo(l/2). Taking expected value 
yields 

tcov{G) < 12t„iix(Z2 I G), 

finishing the lower bound with C 2 = 1 / 12 . □ 


Proof of the upper bound for mixing time of Z 2 I G. The proof of the 
upper bound in Theorem 2.2 is very similar, we just need to make the position of 
the lamplighter also stationary. We can achieve this by waiting an extra strong 
stationary time tg after r® = Tqov has happened. The existence of a separation 
optimal strong stationary time on G is ensured by Theorem 1.9. 

More precisely, we have 

Lemma 2.4. Let tq{x) be a separation-optimal strong stationary time for G 
starting from x € G and define rtf by 

(2.8) :=rcov + T-G(X,_), 

where the chain is re-started at t^ov from {F_xoi^t^)j xun independently of the past 
and Tg is measured in this walk. Then, r* is a strong stationary time for Z 2 I G. 
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The proof of this lemma is omitted here since it is not difficult but quite long, 

see [KP12]. 

With this lemma in hand, we can apply (1-16) - the relation between separation 
distance and strong stationary times - to get 

(2-9) — ■*(/„.a:o) — ^(/q^“o) ['’cov + Tci^Tcav) ^ ' 

Now set t = 8tcov(G) + lOtniix(G). Then by a union bound the right hand side in 

(2.9) is at most 

(2.10) Pa;(, [tcov > 8tcov(G)] + maxP„ [tq > lOtinix(G)] . 

vGG 

The first term on the right hand side is at most 1/8 by Markov’s inequality, and 
for the second term, since tq is separation-optimal, (i.e. it is equality in (1.16)), 
we can put 

P^ [to > lOtinix(G)] = SG(10tmix(G)) < 4dG(5<mix(G)) < 4 = i, 

uniformly over the starting state v. In the inequality with * we used Lemma 
1.10, and the one with A we used the sub-multiplicativity (1.11). Combining this 
estimate with (2.10) and (2.9) and the fact that tmix(G) A thit(G') A tcov(G') for all 
reversible chains (see [LPW08, Chapter 10.5,11.2]), yields that 

tmix(Z2 I G) < 8tcov{G) + 10<inix(G') A 18tcov(G'). 

This finishes the proof of the upper bound with C 2 = 18. □ 

Now we turn to investigate the relaxation time of Z 2 I G. To do so, we will use 
Lemma 1.13 and investigate the behavior of s(t)^A as t —)■ 00 . 

Proof of the upper bound for relaxation time of Z 2 I G. To prove the 
upper bound, we will estimate the tail behavior of the strong stationary time 
T* = Tcov(G') -I- Tc{Xr^^^) in Lemma 2.4, relate it to s*(t), the separation dis¬ 
tance on Z 2 I G. We will use P for P(/,a;) for notational convenience. Combining 
(1.16) by union bound we have 

(2.11) AP(/..)[r">t] 

(2.12) < P[tcov(G) > t/2] 

(2.13) -I- maxPy [tg > t/2] 

yeG 

We write for the hitting time of w G G. We claim that the first term (2.12) can 
be bounded from above by: 

log 2 t 

(2.14) P[rcov(G) > t/2] < Ppw : > t/2] < |G|2e 4 tM^G) ^ 

where thit(G) is the maximal hitting time of the graph G, see (2.2). To see this, 
use Markov’s inequality on to obtain that for all starting states v G G we have 
> 2thit(G)] A 1/2, and then run the chain on G in blocks of 2thit(G). In each 
block we hit w with probability at least 1/2, so we have 

P„[t^ >if(2thit(G))] < 

To get a similar bound for arbitrary t, we can move from [t/2thit (G)J to t/2thit(G) 
by adding an extra factor of 2, and (2.14) immediately follows by a union bound. 
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For the second term (2.13) we prove the following upper bound: 

(2.15) P„[tg >t/2] < |G|e”2G^. 

First note that according to Lemma 1.13, the tail of the strong stationary time 
tq is driven by Ag with Ag being the second largest eigenvalue of the lazy random 
walk on G. More precisely, using the first line in (1.20) we have that for any initial 
state V £ G: 

[tg > t/2] < SG (t/2) < -< 1^1 exp 

Tminl.Lrj Z 

where we used that regularity of G implies 7rmin(G) = |G|“^, and the inequality 
1 — X < e~^ for cc = 1 — Ag- Next we combine the bounds in (2.14) and (2.15) on 
(2.11) with the second inequality in (1.20) to estimate the second largest eigenvalue 
on Zo ; G as follows: 

(2.16) 

|A,r < 2d-(,) < 2,-(,) < 4|G|exp{-‘2|l^} +2|G|exp{-^} . 

In the final step we apply Lemma 1.13: we take the power 1/t and limit as t tends 
to infinity with fixed graph size |G| on the right hand side of (2.16) to get an upper 
bound on A 2 . Then we use that (1 — e~^) < x -\- o(x) for small x and obtain the 
bound on trei(Z 2 I G) finally: 

Gei(Z 2 I G) < max | j;^^hit(G), 2ti.ei(G) 

Then, taking into account that trei(G) < ctmix(G) < Gthit(G) holds for any lazy 
reversible chain (see e.g. [LPW08, Chapter 11.5,12.2]), we can ignore the second 
term. □ 

Proof of the lower bound for the relaxation time. We do not include 
the proof of the lower bound of the relaxation time in these lecture notes since it is 
based on a somewhat different technique: it relies on the analysis of the Dirichlet 
form of the lamplighter walk, with an appropriately chosen test-function /. For 
more details see [LPW08, Chapter 19.2] for 0—1 lamps or [KP12] for general 
lamp graphs. □ 

2.2. Generalized lamplighter walks. One can think of a generalisation of 
lamplighter walks of the following form: instead of 0 — 1 lamps, put at each site 
of the base graph G an identical copy of machine, whose states are represented 
by a lamp graph H with a fixed Markov chain transition matrix Q on H. The 
walker then does the following: as he follows a simple random walk on the base 
graph, he modifies the state of the machines along his path randomly according to 
the transition matrix Q. The state space in this case is a vector of the states of 
each machine plus the position of the walker. We denote the corresponding graph 
hy H I G. One step of the lamplighter walk is then: refresh the machine of the 
departure site, move one step on the base graph, refresh the machine on the arrival 
site. With this dynamics, one can show that the product measure of the stationary 
measure of Q over v G G multiplied by ttg is stationary for this dynamics and the 
chain is reversible. We denote the resulting graph hy H I G. We can characterise 
the relaxation time of such walks as follows, from [KP12]: 
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Theorem 2.5. Let us assume that G and H are connected graphs with G reg¬ 
ular and the Markov chain on H is lazy, ergodic and reversible. Then there exist 
universal constants 0 < ci, Ci < oo such that the relaxation time of the generalized 
lamplighter walk on H iG satisfies 



(2.17) 


Theorem 2.6. Assume that the conditions of Theorem 2.5 hold. Then there 


exist universal constants 0 < C 2 ,G 2 < oo such that the mixing time of the generalized 
lamplighter walk on H iG satisfies 


C2 {tcov{G) + t„e\(H)\G\ log IGj + |G|tniix(-ff)) < ^mix(^ I G), 


(2.18) 



If further the Markov chain is such that 

(A): There is a strong stationary time th for the Markov chain on H which 
possesses a halting state h{x) for every initial starting point x G H, 
then the upper bound of (2.18) is sharp. 

The proofs above for 0—1 lamps can be modified to work for general lampgraphs 
H. In this case, we also have to construct an ‘almost’ stationary time similar to 
Tcov and a true stationary time r*. The first can be done by using copies of a 
separation-optimal th{v), v G G, such that each th(v) is measured only using the 
transition steps of the chain on the machine Hy ai v G G. Then we wait until all of 
the th{v)-s have happened. One can then show that this time is ‘almost’ stationary 
in the sense that reaching it, the state of the lamp-graphs are stationary, but the 
position of the walker is not. A similar estimate to that in Lemma 2.3 gives a lower 
bound on the separation distance. Adding an extra tq again gives a ‘true’ strong 
stationary time t*. 

In most estimates for the mixing and relaxation time oi HlG we can use these 
two stopping times, but there are new terms arising: one has to estimate the local¬ 
time structure of the base graph and also the behaviour of rjj-s. The proofs are 
worked out in [KP12]. 

We mention that the upper and lower bound on the mixing time for H i G do 
match for a wide selection of H and G, but not in general. It remains an open 
problem to give a general formula for the mixing time. 


3. Varopoulos-Carne long range estimate 


In this section we move on to give a general bound on transition probabilities 
of SRW on graphs. Later, we will use this estimate to determine the speed of RW 
on different groups. Let P = (p{x,y)) be a transition probability matrix on state 
space S. Assume reversibility, i.e., that 7r(x) > 0 and Tr{x)p{x,y) = Tr{y)p{y,x) for 
all x,y G S. 

We may consider S as the vertex set of an undirected graph where x,y are 
adjacent iff p{x,y) > 0. Let p{x,y) denote the graph distance in S. We assume S 
is locally finite (each vertex has finite degree). We now state the Varopoulos-Carne 
long-range estimate: 
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(3.1) 


Theorem 3.1 (Varopoulos-Came). W x,y G S andVi G N, 
P *{ x , y ) < 2 


Iniy) 


tt{x) 


F{St > p{x,y)) < 2< 


7r(-y) -p^G,v) 

e 2 t 


'k{x) 


where {St) is simple random walk on Z. 


Remark 3.2. The Varopoulos-Carne estimate gives good bounds on transi¬ 
tion probabilities between vertices that are far away from each other. Another, 
short-distance estimate is the following, that can be found in various forms in the 
literature, see e.g. [LPW08, Theorem 17.17]. Let P be the transition matrix of 
lazy random walk on a graph of maximal degree A. Then 

I , , ,1 V2A5/2 

\P*{x,x) — 7 r(a;) < p—. 

' ' “ ^/i 

Proof of Theorem 3.1. We start by reducing to the finite case. Fix t and 
X. Denote S = {z : p{x, z) < t}. 

Now Vz, w G consider the modified transition matrix 

-f X _ / P{z,w) : z^w 

\ p{z,z)+p{z,S - S) : z = w 

Then p is reversible on S with respect to tt. Since in t steps, the walk started at 
X cannot exit 5, it suffices to prove the inequality for S in place of S, so we may 
assume that S is finite. 

Let ^ = COS0 = —• Taking the t-th power, we see that the coefficients of 

the binomial expansion are exactly the transition probabilities of SRW on Z, which 
gives 

t 

^ F{St = 

k= — t 

By taking the real part, we get 

t 

(3.2) ^ P(S't = fc)cosfc 6 l. 

k——t 

Now denote Qk{0 = coskO. Observe that (5o(C) = l)Qi(C) = C) and the identity 

cos {k + 1)9 + cos {k — 1)0 = 2 cos 9 cos k9 

yields that Qfe+i)^) -I- Qk-i{0 = ‘^^QkiO fo'' all fc > 1. Thus induction gives that 
Qk is polynomial of degree k for all fc > 1 ; these are the celebrated Chebyshev 
polynomials. Further, since QkiO = cos{k9) for ^ = cosd G [—1,1] implies the fact 
that IQkiOl < 1 for ^ G [-1,1]. 

Using the symmetry of cosine function, we can rewrite (3.2) in the form 

t 

e=Y. nst = k)Qik\{o, 

k——t 
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which is an identity between polynomials. Applying it to the transition probability 
matrix P on 5, we infer that 

t 

(3.3) = Y, nSt = k)Q\k\{P) 

k— — t 


We know that all eigenvalues of P are in [—1,1]. Furthermore, the eigenvalues 
of Qk{P) have the form Qk{\), where A is an eigenvalue of P, so they are also in 
[-1,1]. Hence \\Qk{P)v\\^ < ||u||^ for any vector v, where ||u||2 = 

Using this contraction property we can write 


Qk{P)ix,y) 


{Sx,QkiP)Sy)-^ ^ ||4|U||i5y|U ^ \/7r(x)A/7r(y) 

7r(a;) ~ Tr(x) ~ 7 r(x) 



Note that P^{x,y)=Q Vfc < p{x,y) implies Qk{P){x,y) = 0 for fc < p{x,y). 
Hence, by (3.3), we have 


p\x,y)= Y nSt = k)Qik\{P)ix,y)< Y ^St = k) 

\k\>p{x,y) \k\>p{x,y) 



proving the first inequality in (3.1). The second inequality in (3.1) is an application 
of the well-known Bernstein-Chernoff bound 

(3.4) V{St> R) 


For the reader’s convenience we recall the proof. Suppose that P(A = 1) = 1/2 = 
P(X = -1). Then 


E(e^-^) = 


„A _i_ ^—X ^ \2k ^ \2k 

e -\- e V A , V A ^212 


X! {2k)\ - ^ 2^k\ 


k=0 ' ' k=0 

Therefore, 

E(e^®‘) = (E(e^^))* < 
Finally, by Markov’s inequality,, 

P(5t >R)= P(e^^‘ > e^-^) < 


Optimizing, we choose A = R/t, and (3.4) follows. 


□ 


4. Speed of RW on groups and harmonic functions 

In this section we characterize the speed of random walk on groups in terms of 
bounded harmonic functions. For more on this topic see Chapter 13 in [LP15]. 

Let G be a (finite or countable) group, with finite generating set S. We assume 
S = 5“^, and d = IS”!. Recall the right-Cayley graph on G is given by a; ~ y y G 
xS, and the corresponding simple random walk (SRW) has 


(4.1) 


PsRw{x, y) 


3, for y G xS, 
0 , otherwise. 
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We define the lazy random walk (LRW) to avoid periodicity issues: 

{ 5, fory = a; 

for y e xS, 

0, otherwise. 

That is, the transition matrix P = {Psrw + We call e € G the origin, and 

denote p the graph distance in G. We write simply p(e,x) = |x|. 


Definition 4.1. The speed of random walk on G is defined as 

E|X„| |X„| 

v{G) := hm - = a.s. lim -. 

n—>^00 fi n^oo fi 


This dehnition is valid, since the distance is subadditive by the triangle inequal¬ 
ity and the transitivity of G: 


p(e, ^ /l(e, ^n) T ^n) p(e, ^m)- 

Taking expectation yields that the expected distance is submultiplicative, hence 
the speed exist. 

The main goal of here is to characterize when is the speed positive? But first 
some examples: 

Example 4.2. For every d, u(Z'^) = 0. This is easy to see since (E|X„|)^ < 
E(|X„p) = X]r=iE(l^*P) = 71 by denoting Yi the independent unit length incre¬ 
ment of the walk at step i. 

Example 4.3. The speed on the infinite d-ary tree lid is u(nd) = In each 
step of the walk, there are d — 1 edges increasing the distance from the root by -I-1 
and exactly I edge decreasing the distance, hence the speed is d — 2d for non-lazy 
RW and for lazy RW. 

The third example needs some definitions: 

Definition 4.4. A state of the lamplighter group G^ on 7/^ is dehned as (S', x) 
where S C is a finite subset of vertices and a; € Z'^ is the position of a marker 
or lamplighter. Every state in Gd is connected to 2d -I- I other states in Gd- either 
the marker moves to a uniformly chosen neighbour of x or it switches the lamp at 
x: i.e. removes x from S if x € S, and adds x to S if x ^ S. The origin in this walk 
is (0,0, i.e. all lamps off, marker at the origin. 

The set S describes which ‘lamps’ are on, and the marker can switch lamps 
only along his path. He either moves on the base graph Z‘^ or switches the lamp 
where he currently is. 

Example 4.5. The speed of the lamplighter walk on Gi and G2 is zero, while 
v{Gd) > 0 for d > 3. 

Proof. For Gi we can use the marginal distribution of the marker is just a 
SRW on Z, hence its range up to time n is whp less than c-^/nlogn. Thus, any 
state that the lamplighter can reach in n steps has at most only a connected set of 
on-lamps of size c^/nlogn. This has distance at most K^/rllogn from the origin, 
since the marker can just walk along its range, switch off each lamp that is on and 
return to the origin, taking at most K^Jnlogn steps for some K > 0. 
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For G 2 , the range of SRW on is whp n/ logn, so the same argument can be 
applied to show that the speed is zero. 

For d > 3, the range of SRW on is linear in n, and with positive probability 
there are going to be a linear number of lamps on, hence the speed is positive, 
too. □ 


Discussion. We see that it is not the growth rate that characterizes the speed: 
trees and lamplighter groups both grow exponentially. What does characterize the 
speed? the answer is given by bounded harmonic functions. 


4.1. Bounded harmonic functions and tail cr-algebras. We start with a 
definition: 

Definition 4.6. We say that a bounded function u : G —> R is harmonic for 
the simple random walk on G if 


u(x) 




that is we have u = PsrwU = Pu. 

We define the tail cr-algebra as T = Hn T contains all events 

which are independent of the trajectory up to any fixed finite time. Tail events can 
easily generate harmonic functions, we list some examples: 

(1) On lid, does the RW end up eventually in a given sub-branch of the tree? 

Ui{x) := Pa;(RW ends in a given sub-branch of the tree) 

(2) On Gd with c? > 3, is the lamp at x eventually on? 

U 2 {x) := Pa;(the lamp at y is going to be eventually on) 

U 3 {x) := P(the lamps in the subset A are all going to be eventually on) 

One can easily argue that Ui,U2,U3 are non-constant by moving the starting point 
X further and further away from the points / sets under consideration and using 
transience properties of the marker. 

Definition 4 . 7 . We call / = f{XQ,Xi,X2,...) a tail-function if changing 
finitely many values in the trajectory (Xq, Xi, X2, ...) does not change the value 
of/. 

Claim 4.8. Every tail function generates a bounded harmonic function by 


Uf{x)=E,{fiXo,Xi,X2,...)) 


for random walk on groups or for lazy chains. 
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Proof. We prove it for lazy chains only. First we start with a total variation 
bound on binomial random variables^: 


(4.3) 


|jBin(n, i) —Bin(n + 1, 


I TV ^ 


< 2 “ 


= 2 "”"^ 


= 2"”"i 




1 +^) 

•\/27m 


First fix some £ > 0 and pick n large enough such that ll/lloo < £• Look at 

two copies of the lazy walk: (Xq, Xi, X 2 ,...) and (Xq, Xi, X 2 ,...). We can then 
construct a coupling between these two trajectories by using a non-lazy random 
walk Y, and set X„ = FBin(n,i) X„+i = FBin(n+i,i)- The bound in (4.3) 
and the coupling characterisation of total variation distance (1.4) tells us that 
we can couple these two trajectories such that P(X„ ^ X„_|_i) < P(Bin(n, ^ 
Bin(n + 1, 5 )) < £. Hence, we can write 


(Pr/) (a;) - uj{x) = E,(/(Xi,X 2 ...))- E,(/(Xo,Xi,X 2 ...)) 


= E,(/(Xi,X 2 ...))-E,(/(Xo,Xi,X 2 ...)) 

< ll/lloo •P(X„^X„+i)<£, 


where in the last step we used that if the two trajectories are coupled by time n, 
then clearly they only differ in finitely many steps, and / is a tail function, hence 
it takes the same value on {X„ = X„+i}. Since e was arbitrary, we get Puf = Uf, 
finishing the proof. □ 


The reverse direction is also true: 


Claim 4.9. Every bounded harmonic function u defines a tail function /„ by 
/„(Xo,Xi,X 2 , ...):= limsupM(X„). 

n—^oo 

Proof. Since u is bounded and harmonic, the function u{Xn) is a bounded 
martingale. Hence, by the martingale convergence theorem we get that it converges. 
Further, the definitions of the two claims are giving a correspondence between 
bounded harmonic functions and tail-functions since it/^(x) = E 3 ;(/„(Xo, Xi,...)) = 
E 2 ,(limsupu(X„)) = u{x) by the martingale stopping theorem. □ 

We call a cr-algebra F trivial if VA G X, Pa, (A) G {0,1}. 

We will need the the following equivalence. 

Theorem 4.10. For random walk on a group, the tail a algebra T is trivial if 
and only if every bounded harmonic function on G is constant. 

Proof. Suppose first that T is trivial. Let it be a bounded harmonic function. 
Then limsupit(X„) is a tail function, so it must be constant a.s. By irreducibility, 
this constant c does not depend on the starting point. Writing it(x) = Ea;it(X„) and 
passing to the limit using the bounded convergence theorem proves that u{x) = c for 


^We set ( := 0. 
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all X. This direction is valid for any irreducible Markov chain. The other direction is 
not hard to verify for lazy irreducible Markov chains: Suppose all bounded harmonic 
functions are constant, and A G T- Then it is easy to check that u{x) = Px(kl) is a 
harmonic function, so the Levy zero-one law implies that Pa,(^) G {0,1} for every 
X. Without assuming Laziness, but using the group structure instead, one can also 
show that u is harmonic. This can be proved using entropy or via Derriennic’s 
zero-two law [Der76], see Chapter 13 in [LP15] for details. □ 


Entropy. To state the next theorem, we need some basic properties of entropy, 
which we include here for the reader’s convenience. 


Definition 4.11. The entropy of a random variable X with distribution on 
state space S is defined as 

H{X) := ^p^log( —). 

and the relative entropy of measure P with respect to another measure Q on the 
same state space S is defined as 


D{P\Q) := '^p^\og ( 


Px 

V 


The relative entropy is always nonnegative since logt < t — 1 for t > 0, hence 

'Qx 


-D{P\Q) = ^p,rlog ( — ) <^Px 
xg5 


x^S 


Px 


- 1 = 0 . 


Finally, the conditional entropy is defined as the entropy of the conditional measure 
Pix\y) = i.e. 


We write H{X,Y) for the entropy of the joint distribution of {X,Y). Then it is 
not hard to see that 


H{X\Y) = H{X,Y) - H{Y) < H{X), 

since H{X) + HiY) — H{X, Y) = D{px ■Py\px,y) > 0 with equality if and only if X 
and Y are independent. As a corollary we get that for any three random variables 

(4.4) H{X\Y, Z = z)< H{X\Z = z) ^ H{X\Y, Z) < H{X\Z). 

It can also be shown that the uniform distribution on set S (with |iS| = n) maximizes 
the entropy: 


0 < D{P^,U[S]) = '^pxloginp^) = logn - H{X). 

xGS 
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4.2. The Kaimanovich - Vershik - Varopoulos theorem. The next the¬ 
orem is by Kaimanovich - Vershik (’83) [KV83] and Varopoulos (’85) [Var85]. 


Theorem 4.12. For random walk on a group G, the followings are equivalent: 

(1) the speed v{G) > 0, 

(2) 3 a hounded non-constant harmonic function u on G, 

(3) the entropy of the walk h = lim„_>oo ^ 

Proof. First we show (2)0(3). Write the joint entropy in two ways: 
i7(Vfe, V„) = iJ(Vfc) + i/(V„|Vfe) = H{Xk) + H{X^_k) 

H{Xu,Xn) = H{Xn) + H{Xu\Xr.) 


Rearranging and taking k = 1 yields that 

(4.5) il(Xi) -f H{Xn-i) - H{X^) = i/(Vi|V„) = i/(Vi|(V„, X„+i,...)), 


where the last equality is due to the Markov property. Since conditioning on less 
information increases the entropy (see (4.4)), iJ(Xi|(V„, ...)) is an increas¬ 

ing function of n. So, the left hand side in (4.5) is also increasing, so we get 
that H„ := H{Xn) — H{Xn-i) is decreasing. Hence, /i„ —)■ h for some h > 0. 
So we get, that t h. Now if h > 0, then taking n —>■ oo in (4.5) gives 

H{Xi\T) = H{Xi) — h, that is, conditioning on T influences the entropy: hence T 
can not be trivial. On the other hand if /i = 0 then H{Xk\T) = H{Xk) for all k, 
hence, the tail T is independent of {Vi, X2, ■ ■ ■ Xk\. Thus, it must be trivial itself. 
Next we show (3)eA(l). Apply the Varopoulous-Carne estimate on transitive 

\x\^ 

groups to see that Pnix) < 2e“T5T, and use this estimate on — logp„(x) in the 
definition of H{Xn) to get 

H{X^) = '^pn{x){-\og{pn{x)) > ^p„(a;)(-log2 -h ^) 


Rearranging terms and dividing by n yields 

log2 + i/(V„) ^ E|A„|2 ^ 1(E|V„|)2 _ 1 

- 2 ^ 1 ^ - 2 ^^^^ ’ 

where we used Jensen’s inequality in the last step. Now clearly v{G) > 0 implies 

M>o. 

On the other hand, we can define the spheres Sk '■= {y & G : \y\ = k} and the 
measure Q{x) = if x e Sk is a probability measure on G. We calculate the 

relative entropy 


0 < iJ(Q|P”) = + l)log2d^ - H{Xr,), 

where we used the bound — logQ(2;) < log(2c?)^'+^ since the degree is d. Now 
dividing by n yields 


(E[|V„| + l])log2d g(X„) 

~ n n ’ 

and passing to the limit shows that if h = lim„ > 0 then the speed is also 

positive. This finishes the proof. □ 
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5. Geometric bounds on mixing times 

Let G be a (finite or countable) group, with finite generating set S. We assume 
S = S~^, and d = l^l- Recall the right-Cayley graph on G is given hy x ^ y y G 
xS, and consider simple random walk on G as in (4.1) Let p denote graph distance 
in G. 

Theorem 5.1. For simple random walk on G = (S), 

(a) If |G| < oo, then E[p{Xq, Xn)'^] > ^ for n < where X = X 2 is the 

second eigenvalue. 

(b) If |G| = 00 and G is amenable, then E[p(Xo, X„)^] > for all n> 1. 


Remark 5.2. (1) The theorem is proved in Lee-Peres [LP13] in the more 

general setting of random walks on transitive graphs. 

(2) Part (b) for Cayley graphs was first discovered by Anna Ershler (unpub¬ 
lished) who relied on a harmonic embedding theorem of Mok. 

(3) If G is nonamenable, then we know that Ep(Xo,X„) > cn, so that 
'E,[p{Xq, XnY] > for some constant c > 0. 

Theorem 5.1 for finite, transitive graphs gives a very general upper bound on 
relaxation and mixing times of finite groups: 

Corollary 5.3. Write diam(G) for the diameter of G = {S). Then 

trei(G) < 2d ■ diam(G)^, 

(5-1) 

tmix(G) < 2d ■ diam(G)^ • log |G|. 

It is an open problem whether tniix(G) < Gd-diam(G)^ holds for every transitive 
finite chain. 


Proof of Corollary 5.3. Apply part (a) of Theorem 5.1 with n = Gei(G): 
diam(G)2 > E[p(Xo,X„)2] > ^1^. 

For the second inequality, use [LPW08, Theorem 12.3] stating that tmix(G) < 

^OS(^min)^rel(G). HH 

To prove Theorem 5.1, we use the following key lemma from [LP13] (that 
is valid for transitive graphs as well). We define the Dirichlet forms Qn{f) ■= 
((/-P")/,/). 


Lemma 5.4. For the simple random walk on G as in Theorem 5.1 and any 
f € I^(G), we have 


E[p(Xo,X„) 2 ] > 


1 Qnif) 

dQiif) ■ 


Proof of Theorem 5.1 (finite case) from Lemma 5.4. In the finite case, 
take / as an eigenfunction such that Pf = Xf with ||/||2 = I. Then Qn{f) = 1 —A". 
Using the condition n < we can write 


d-E[p(Xo,X„)2] > 


1 - A" 


1-A 


n—1 n—1 ^ n—1 

1=0 1=0 1=0 


i 

n 



□ 
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The infinite case is harder and will be proved later. 

Proof of Lemma 5.4. Given / € P{G), construct F : G ^ £^(G) by F{x) := 
{fi9^)}geG- Compute (with Xq = xq) 

E\\FiXo) - F{X,)\\l = E ^ WfigXo) - f{gX,m = E E l/(^) " v) 

geG X y 

(5.2) 

= E + U{y)? - ‘^f(.x)f[y)]p{x, y) = 2{{I - P)f, f) = 2Qi(/). 

X y 

Similarly, E||F(Xo) — F{Xn)\\\ = 2Q„{f). Now, (5.2) implies that 

^\\F{xo)-F{y)\\l<2Q,{f) 

for any xq, y with Xq ^ y. Thus, F is Lipshitz with Lip(F) < ^2(iQi(/). Therefore, 
2QM) = E||F(X„)-F(Xo)||^ < (Lip(F))2E[p(Xo,X„)2] < 2dQi(/)E[p(Xo,X„)2]. 
Rearranging proves Lemma 5.4. □ 


Now we turn to the proof of Theorem 5.1 for infinite G. We will need the 
following lemma: 


Lemma 5.5. Given f G £‘^{G), 

E[p(Xo,Xj2]> ^ 


11(1-P)/f 
2d Qiif) 


Proof. We use Lemma 5.4. We need to lower bound Qn{f), and show that it 
grows almost linearly. For this, we use the differences and bound second differences 
as follows: 


A, = Q,+i(/) - Q,{f) = {P^f - P^+V, /) = ((/ - P)PV, /) = (PV, {I - P)f)- 

Thus, 

I A, - A,_i| = |(P^-i(J- P)/, (J- P)/)| 

< ||P^-i(J - P)fh - IK/ - P)/||2 < ||(/ - P)f\\l := 5 , 
by Cauchy-Schwarz. Now Aq = Qi{f) and Aj > Aq — jS whence 


Qnif) = E > nQi{f) - ^ . 

j=o 

Thus, 

Qnif) ^ nmi-p)fr 

Qi(/) - 2Qi(/) 

and the lemma follows from Lemma 5.4. □ 


Proving the theorem for G infinite is harder; we first give the proof under an 
additional assumption. 

Assumption 5.6. Suppose that ^^Q(P'^l{ 3 ,j,})(a:) := Green(a:o, x) is in .^^(G). 
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Proof of Theorem 5.1 (infinite case) assuming Assumption 5.6. Note 
that Lemma 5.5 gives the statement of theorem if we can find a sequence of functions 
h for which ^ 0. 

Let {Afc} be a sequence of Folner sets, i.e., 5k := 0 as fc —>■ oo. Here 

OeA denotes the edge-boundary of the set A, i.e. the edges between A and A'^. 
Write i/>fc = lAfc and fk = Assumption 5.6 implies that fk G P{G). 

Note that (/ - P)fk = ipk and /fc(x) = IfWeAfc}]- If Pix, A^) > r, then 

fk{x) > r, so combining these yields 

Qiifk) = {{I -P)fk,fk) = X! fk{.x) > r\{x G Ak : p{x,Al) > r}| 

xGAk 

> r[\Ak\ - d\dEAk\] = r|Afc|(l - dSk) ■ 


Letting k ^ oo gives liminffc_,.oo ^ whence —>■ oo since r was 

arbitrarily large. By Lemma 5.5, 


E[p{Xo,X^ f] > ^ 


n'^ \Ak\ 

2 dQi(/fe) ■ 


Letting fc —>■ oo proves the theorem assuming Assumption 5.6. 


□ 


Removing Assumption 5.6. For the next lemma, we recall that if P is 
transient or null-recurrent, then we have the pointwise limit, 

(5.3) P*/ —)• 0 for every / G 


Lemma 5.7. Suppose that P satisfies (5.3) and, for some 6 G (0, ^), there exists 
an f G with ||/||2 = 1 and \\Pf — f \\2 < 0- Then there exists a ip G £^(V) 

such that 


(5.4) 


W-PMl 

{ip, {I - P)p) 


< 32 6». 


Proof of Theorem 5.1 for infinite G without Assumption 5.6. The proof 
follows by picking f := f^k = ^Ak/\/\Ak\ for the Folner sets defined above. By 
picking k large enough ipk satisfies the condition of Lemma 5.7 for arbitrarily small 
0 > 0 , since in this case ||/||| = |Afc|/|Afe| = 1 and 


\\Pf^k - f’kWl 


E(Ai G A^) < 

x^Ak 


l^fcl 


— dfc —> 0. 


Combining then these with Lemma 5.5 yields the proof. 


□ 


Proof of Lemma 5.7. Given / G £'^{V) and A: G N, we define pk G £^(14) by 

k-l 

Pk = Y^'f- 

i=0 


First, using (/ — P)pk = {I — P^)f and the fact that P is a contraction, we have 

(5.5) \\{I-P)Tk\\l<m\l 
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On the other hand, 

- P)‘Pk) = - P'')f) 

k-1 

= {2(pk- ‘f2k,f), 

where in the second line we have used the fact that I—P^ is self-adjoint. Combining 
this with (5.5) yields 

15.61 W-P)Vk\\l ^ 4||/||i 

{^k:{I - P)^k) ~ {‘2'^k- 

The following claim will conclude the proof. 

Claim; There exists a A: S N such that 

( 5 . 7 ) {2^k-V2kJ)>l-^. 

It remains to prove the claim. By assumption, / satisfies ||/||2 = 1, and ||P/—/II 2 < 
9. Since P is a contraction, we have \\P^f — P^~^f \\2 < 9 for every j > 1, and 
thus by the triangle inequality, UPV — /II 2 < for every j > 1 . It follows by 
Cauchy-Schwarz that (/, (J — P^)f) < j9, therefore 

{f,P^f) >l-j9. 

Thus for every j > 1, 

((^ 20 /) > 2 ^( 1 - 2 ^ 0 ). 

Fix A e N so that 2^0 < 4 < 2^+^9, yielding 

(5.8) (v? 2 g/)>^. 



Now, let am = {^ 2 ^, /), and write, for some IV > 1, 

N-l „ 

Ojv _ ~ Om+l 

~ 2N-i ~ 2-^ ’ 

m—l 

By (5.3), we have (P*/i /) —> 0 as i > 00 , hence limAr_>oo = 0. Using (5.8) and 
taking iV —00 on both sides above yields 

1 , _ 2am ~ Clm+l 

gff — / y 2m—^-1-1 

771 —£ 

Since J^m^e ~ there must exist some m > i with 2am — «m+i > This 

establishes the claim (5.7) for fc = 2"* and, in view of (5.6), completes the proof of 
the lemma. □ 


6. Balanced random walks with interaction 


First we start with some examples. 
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6.1. Some examples. 

Example 6.1. A martingale (X„)„ in Z^, moves horizontally at times t G 
[2^^, 2^^+^) with k even and vertically t G 2^^+^) (to nearest neighbours, 

with equal probabilities in both cases). 

Informally, this process is between one and two dimensional, as it has long 
one-dimensional segments. 

Claim 6.2. This process is transient. 

Proof. In the fcth horizontal segment, the process can only visit x if it is on 
the right horizontal line, which has probability 0{1/V^). Since this is summable, 
the process only visits x finitely many times. Similarly for vertical segments. □ 

Example 6.3 (Benjamini-Kozma-Schapira [BKSll]). A martingale (X„)„ in 
Z^, moves vertically on the first visit to each site, and horizontally on subsequent 
visits. 

Question 6.4. Is this recurrent or transient? [BKSll] includes this and sev¬ 
eral other open problems of similar nature. 

Example 6.5 (Nina Gantert; see Ofer Zeitouni’s St. Flour 
lecture notes on RWRE). On 7 ? again, a martingale moves 
horizontally with probability 2/3 (long arrows) and verti¬ 
cally with probability 1/3 when |a:| < \y\, and with oppo¬ 
site probabilities otherwise (including jxj = \y\). 

Proposition 6.6. This process is transient. 

For the proof we use the following basic results. 

Lemma 6.7. If a Markov chain on S has non-constant (f> : S ^ R”*" with Pcj) < cj) 
(pointwise) then the chain is transient. 

Proof. (j){Xt) is a non-negative super-martingale, and so must converge, which 
contradicts recurrence. □ 

Lemma 6.8 (Excessive measure). If yP < y pointwise and y,P fi for a 
positive measure p on S, then (Xt) is transient. 

Proof. For any recurrent irreducible chain we have a stationary measure given 
by Tr(x) = ^ lxi=x, where is the return time, and a is an arbitrary 

reference state. Consider the reverse chain with transitions p{x,y) = 

Then tt is also stationary for P. Moreover Pft^ = Pft^, and so P is also recurrent. 

In our case, the assumptions imply that 4> = ^ has P(j) < (p. By Lemma 6.7 P 
is transient, and so P must be transient as well. □ 

Proof of Proposition 6.6. Consider p = 1. Then pP < p and is strictly 
smaller at 0. □ 

6.2. Walks with few step distributions. [BKSll] raise the following ques¬ 
tions. 

Question 6.9. Fix two measures pi, p 2 on Z"^, d> 3 with mean 0 and bounded 
support of full dimension. Consider a process that makes steps with law p 2 on the 
first visit to a site, and pi on all subsequent visits. When is this recurrent/transient? 
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Question 6.10. More generally, what if the process moves from Xt by or 
^2 and the choice is adapted to J-t- 

The next theorem answers these questions (from [PPS13]) 

Theorem 6.11. Fix any two measures Hi, 1 x 2 on , d > 3 with mean 0 and 
bounded support of full dimension. Let {Xt)t be a process such that conditioned on 
Tt the step Xt+i — Xt has law either pLi or fj, 2 - Then (X) is transient. 

In contrast, there are recurrent processes with three possible step distributions: 

Example 6.12. In Z^, make a step of ±I in the coordinate with maximal 
absolute value with probability 1 — 2e, and in each of the other coordinates with 
probability e each. 

Theorem 6.13. This process is recurrent for e > 0 small enough. In a 
similar construction works with d measures. 


Compare this to a continuous diffusion with larger variance in the radial di¬ 
rection. The absolute value is a Bessel process, and by adjusting the covariance 
matrix, we can control the dimension and even make it less than 2, making the pro¬ 
cess recurrent. The proof is based on careful construction of a Lyapunov function. 


Proof of Theorem 6.11. First we investigate the case of a single increment 
measure fx. Let Z have law p, and consider M = Cov(/r) = ¥.[ZZ'^). By applying 
a linear map, we may assume this is a diagonal matrix diag(A). 

Let = |a;|“^“. Using a Taylor expansion we have 


(fix + z) 


= 1 - 


2ax'^z ct\z\‘^ Q;(a-I-1) 


(j){x) |Tp \x\^ 

Taking expectation (with E(Z) = 0) we get 
' 4>{x + Z) 


+ Oi\x\-^). 


E 


— I + j^{—^\Z\^\x\^+ 2{a + l)x'^Mx) + 0{\x\ 
^{2{a + l)Ai — trM) -|- 0(|a;|“^) 


= 1-h 


— V 

l-r|4 


If 

(6.1) 2Ainax<trM 

and a > 0 is sufficiently small then we get transience, since the sum is negative and 
dominates the error term. We can truncate (j> so that the inequality holds for small 
X as well. Hence, transience follows from Lemma 6.7. 

Clearly (6.1) is impossible for 2-dimensional matrices, so we need dimension at 
least 3. 

Note that if there are several increment laws pi, the same (j) may be super¬ 
harmonic for all of them simultaneously. In that case, an arbitrary adapted choice 
of Pi for the steps does not affect transience. 

For steps with a single law, we may consider instead the process M~^/‘^X which 
has Cov = /, and (6.1) holds. 

For a pair of matrices, we can always ensure (6.1), hence transience is guaran¬ 
teed: 
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Claim 6.14. For any pair of 3 x 3 symmetric positive definite matrices Mi, M 2 
there is an A so that AMiA^ both satisfy (6.1). 

To see this, first apply some A to make Mi the identity, next diagonalize M 2 
by a unitary matrix, (thus keeping Mi — /). If at this point M 2 — diag(a, 6, c) 
apply A = diag('^/67^, 1,1) to finish, as the matrices are now diag(6/a, 1,1) and 
diag(b, b,c). □ 

Acknowledgement. We are grateful to Omer Angel, Jian Ding and Miki Racz for 
scribing some of these notes, and to Lucas Boczkowski and Perla Sousi for helpful 
corrections. 
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